-
Notifications
You must be signed in to change notification settings - Fork 0
Add async example projects with comprehensive tests #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added three example projects demonstrating py-key-value usage: 1. chat_app: Simple chat message storage - PydanticAdapter for type-safe message storage - StatisticsWrapper for operation metrics - TTLClampWrapper for automatic message expiration - LoggingWrapper for debugging 2. trading_data: Trading data cache with compression - PydanticAdapter for type-safe price data - PassthroughCacheWrapper for multi-tier caching (memory + disk) - CompressionWrapper for efficient storage - RetryWrapper for transient failure handling - StatisticsWrapper for cache hit/miss metrics 3. web_scraper_cache: Web scraper cache with encryption - PydanticAdapter for type-safe scraped data - FernetEncryptionWrapper for encrypted storage - LimitSizeWrapper to prevent huge pages (5MB limit) - TTLClampWrapper for controlled cache duration - FallbackWrapper for resilience (memory fallback) Each example includes: - Comprehensive test suite using pytest-asyncio - Detailed README with usage examples and production guidance - pyproject.toml and requirements.txt for dependencies - Async-only implementation as requested Fixes #105 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: William Easton <strawgate@users.noreply.github.com>
📝 WalkthroughWalkthroughAdds three example projects under Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ChatApp
participant Adapter as PydanticAdapter
participant Store as KeyValueStore
rect rgb(230, 245, 255)
Note over ChatApp,Store: Flow: ChatApp -> Adapter -> Wrapper stack (Logging → TTLClamp → Statistics) -> Store
end
Client->>ChatApp: send_message(conversation_id, sender, content)
ChatApp->>Adapter: put(collection="conversation:{id}", key=ts, value=ChatMessage, ttl=24h)
Adapter->>Store: put(...)
Store-->>Adapter: OK (message_id)
Adapter-->>ChatApp: message_id
ChatApp-->>Client: message_id
Client->>ChatApp: get_message(conversation_id, message_id)
ChatApp->>Adapter: get(collection="conversation:{id}", key=message_id)
Adapter->>Store: get(...)
alt found
Store-->>Adapter: serialized ChatMessage
Adapter-->>ChatApp: ChatMessage
ChatApp-->>Client: ChatMessage
else not found
Store-->>Adapter: None
Adapter-->>ChatApp: None
ChatApp-->>Client: None
end
sequenceDiagram
participant Scraper
participant WebScraperCache
participant DiskKV as DiskStore
participant MemKV as MemoryStore
rect rgb(250, 240, 230)
Note over WebScraperCache: Stack (inside→out): TTLClamp → LimitSize → FernetEncryption → Fallback(Disk ↔ Memory)
end
Scraper->>WebScraperCache: cache_page(url, content, headers, ttl)
WebScraperCache->>WebScraperCache: _url_to_key(url)
WebScraperCache->>WebScraperCache: apply TTL clamp, size check, encrypt
WebScraperCache->>DiskKV: put(key, encrypted_blob, ttl)
alt disk put fails
DiskKV-->>WebScraperCache: error
WebScraperCache->>MemKV: put(key, encrypted_blob, ttl)
end
WebScraperCache-->>Scraper: bool (cached)
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
@claude please lint and fix any other issues |
|
Claude finished @strawgate's task —— View job Linting and fixes completedSummaryI've successfully linted and fixed all issues in the example projects. All Python code now passes ruff linting checks, and all markdown files pass markdownlint checks. Tasks Completed
Fixed IssuesPython Linting:
Markdown Linting:
Type Checking:
Changes CommittedAll changes have been committed and pushed to the branch: Commit: --- | |
- Add test file pattern to ruff ignore rules for S101 - Fix logging issues in web scraper example (use module logger) - Fix try/except pattern in web scraper cache_page method - Auto-format code with ruff (import sorting, blank lines) - Fix markdown formatting issues (code block languages, list spacing) Co-authored-by: William Easton <strawgate@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 35
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
examples/README.md(1 hunks)examples/chat_app/README.md(1 hunks)examples/chat_app/__init__.py(1 hunks)examples/chat_app/chat_app.py(1 hunks)examples/chat_app/pyproject.toml(1 hunks)examples/chat_app/requirements.txt(1 hunks)examples/chat_app/test_chat_app.py(1 hunks)examples/trading_data/README.md(1 hunks)examples/trading_data/__init__.py(1 hunks)examples/trading_data/pyproject.toml(1 hunks)examples/trading_data/requirements.txt(1 hunks)examples/trading_data/test_trading_app.py(1 hunks)examples/trading_data/trading_app.py(1 hunks)examples/web_scraper_cache/README.md(1 hunks)examples/web_scraper_cache/__init__.py(1 hunks)examples/web_scraper_cache/pyproject.toml(1 hunks)examples/web_scraper_cache/requirements.txt(1 hunks)examples/web_scraper_cache/scraper.py(1 hunks)examples/web_scraper_cache/test_scraper.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
examples/web_scraper_cache/__init__.py (1)
examples/web_scraper_cache/scraper.py (2)
ScrapedPage(33-39)WebScraperCache(42-173)
examples/trading_data/__init__.py (1)
examples/trading_data/trading_app.py (2)
PriceData(31-37)TradingDataCache(40-170)
examples/web_scraper_cache/test_scraper.py (1)
examples/web_scraper_cache/scraper.py (7)
ScrapedPage(33-39)WebScraperCache(42-173)cleanup(170-173)cache_page(107-129)get_cached_page(131-142)is_cached(157-168)invalidate_page(144-155)
examples/web_scraper_cache/scraper.py (1)
examples/web_scraper_cache/test_scraper.py (1)
cache(13-18)
examples/chat_app/test_chat_app.py (1)
examples/chat_app/chat_app.py (6)
ChatApp(35-135)ChatMessage(27-32)send_message(67-92)get_message(94-105)delete_message(107-118)get_statistics(120-135)
examples/trading_data/test_trading_app.py (1)
examples/trading_data/trading_app.py (7)
PriceData(31-37)TradingDataCache(40-170)cleanup(167-170)store_price(80-100)get_price(102-113)delete_price(132-143)get_cache_statistics(145-165)
examples/chat_app/__init__.py (2)
examples/chat_app/test_chat_app.py (1)
chat_app(12-14)examples/chat_app/chat_app.py (2)
ChatApp(35-135)ChatMessage(27-32)
examples/trading_data/trading_app.py (1)
examples/trading_data/test_trading_app.py (1)
cache(12-16)
🔇 Additional comments (7)
examples/README.md (1)
224-229: All referenced paths exist and are correct; no action needed.Verification confirms all paths referenced in examples/README.md are valid:
../README.mdand../DEVELOPING.mdexist at repo root../key-value/key-value-aio/src/key_value/aio/adapters/pydantic,../key-value/key-value-aio/src/key_value/aio/wrappers, and../key-value/key-value-aio/src/key_value/aio/storesall exist in the repositoryThe paths are not stale and do not require replacement.
Likely an incorrect or invalid review comment.
examples/trading_data/requirements.txt (1)
1-2: Dependencies inexamples/trading_data/are already in sync.The review comment incorrectly suggests drift between
requirements.txtandpyproject.toml, but verification confirms both files declare identical dependencies:py-key-value-aio>=0.2.8andpydantic>=2.0.0. No consolidation or CI checks are needed.Likely an incorrect or invalid review comment.
examples/web_scraper_cache/requirements.txt (1)
1-3: Dependencies are already consistent and well-maintained.Both
examples/web_scraper_cache/requirements.txtandexamples/web_scraper_cache/pyproject.tomldefine identical constraints for all three dependencies. No changes are needed.examples/web_scraper_cache/README.md (1)
200-207: No changes needed—import path is correct.The py-key-value-aio package properly exposes VaultStore at
key_value.aio.stores.vault.store, matching the import in the code. The import will not break.examples/chat_app/requirements.txt (1)
1-2: Verify py-key-value-aio availability; pydantic alignment confirmed.The PyPI package name "py-key-value-aio" is correct, and the code correctly uses
pydantic>=2.0.0withPydanticAdapter. However, version information conflicts: the initial source confirmed 0.2.8 (released Oct 24, 2025), but PyPI currently shows versions up to 0.2.1, with piwheels listing up to 0.2.7. This may reflect a very recent release not yet reflected in all package indices.The requirement
>=0.2.8may be unachievable depending on actual availability. The package does not declare pydantic as a dependency, so listing it separately in requirements.txt is correct and aligns with code usage (BaseModel and PydanticAdapter both require pydantic).examples/trading_data/README.md (1)
170-191: The import path is correct and verified in the latest py-key-value-aio package.The documented import
from key_value.aio.wrappers.limit_size.wrapper import LimitSizeWrappermatches the actual package structure. The LimitSizeWrapper class is properly defined and re-exported through the package's__init__.pyfiles as expected.examples/chat_app/test_chat_app.py (1)
16-29: Solid async test coverage for CRUD, isolation, and stats.Good breadth and clarity. No blocking issues found.
Also applies to: 30-53, 54-70, 71-75, 76-80, 81-101, 102-117
| if __name__ == "__main__": | ||
| asyncio.run(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Move logging configuration under the main guard.
Set logging when running the demo.
-if __name__ == "__main__":
- asyncio.run(main())
+if __name__ == "__main__":
+ logging.basicConfig(
+ level=logging.INFO,
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+ )
+ asyncio.run(main())🤖 Prompt for AI Agents
In examples/chat_app/chat_app.py around lines 182 to 183, the logging
configuration is currently outside the __main__ guard; move the
logging.basicConfig (and any logging level/config setup) so it runs only when
the script is executed as __main__. Wrap the logging configuration immediately
before asyncio.run(main()) inside the if __name__ == "__main__": block to ensure
demo logging is set only for direct runs and not when imported.
| def _url_to_key(self, url: str) -> str: | ||
| """ | ||
| Convert URL to a safe cache key. | ||
| Uses SHA-256 hash to create a consistent, filesystem-safe key. | ||
| Args: | ||
| url: The URL to convert | ||
| Returns: | ||
| Hashed key string | ||
| """ | ||
| return hashlib.sha256(url.encode()).hexdigest() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Normalize URLs before hashing to reduce duplicate keys.
Hashing raw URLs treats trivial differences (case, default ports, fragments, trailing slash) as different keys.
Apply:
+from urllib.parse import urlsplit, urlunsplit
@@
def _url_to_key(self, url: str) -> str:
@@
- return hashlib.sha256(url.encode()).hexdigest()
+ norm = self._normalize_url(url)
+ return hashlib.sha256(norm.encode()).hexdigest()
+
+ def _normalize_url(self, url: str) -> str:
+ parts = urlsplit(url)
+ scheme = parts.scheme.lower()
+ netloc = parts.netloc.lower()
+ # Drop default ports
+ if scheme == "http" and netloc.endswith(":80"):
+ netloc = netloc.rsplit(":", 1)[0]
+ if scheme == "https" and netloc.endswith(":443"):
+ netloc = netloc.rsplit(":", 1)[0]
+ # Remove fragment; ensure non-empty path
+ path = parts.path or "/"
+ return urlunsplit((scheme, netloc, path, parts.query, ""))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _url_to_key(self, url: str) -> str: | |
| """ | |
| Convert URL to a safe cache key. | |
| Uses SHA-256 hash to create a consistent, filesystem-safe key. | |
| Args: | |
| url: The URL to convert | |
| Returns: | |
| Hashed key string | |
| """ | |
| return hashlib.sha256(url.encode()).hexdigest() | |
| def _url_to_key(self, url: str) -> str: | |
| """ | |
| Convert URL to a safe cache key. | |
| Uses SHA-256 hash to create a consistent, filesystem-safe key. | |
| Args: | |
| url: The URL to convert | |
| Returns: | |
| Hashed key string | |
| """ | |
| norm = self._normalize_url(url) | |
| return hashlib.sha256(norm.encode()).hexdigest() | |
| def _normalize_url(self, url: str) -> str: | |
| from urllib.parse import urlsplit, urlunsplit | |
| parts = urlsplit(url) | |
| scheme = parts.scheme.lower() | |
| netloc = parts.netloc.lower() | |
| # Drop default ports | |
| if scheme == "http" and netloc.endswith(":80"): | |
| netloc = netloc.rsplit(":", 1)[0] | |
| if scheme == "https" and netloc.endswith(":443"): | |
| netloc = netloc.rsplit(":", 1)[0] | |
| # Remove fragment; ensure non-empty path | |
| path = parts.path or "/" | |
| return urlunsplit((scheme, netloc, path, parts.query, "")) |
🤖 Prompt for AI Agents
In examples/web_scraper_cache/scraper.py around lines 93 to 106, the current
_url_to_key hashes the raw URL which creates duplicate keys for trivial
differences; modify the function to first normalize the URL: parse with
urllib.parse, lowercase scheme and hostname, remove default ports (remove :80
for http and :443 for https), drop the fragment, preserve the query, normalize
the path by converting empty path to "/" and removing a trailing slash unless
the path is just "/", then reconstruct the normalized URL string and SHA-256
hash that normalized string to produce the key.
| async def get_cached_page(self, url: str) -> ScrapedPage | None: | ||
| """ | ||
| Retrieve a cached page. | ||
| Args: | ||
| url: Page URL | ||
| Returns: | ||
| ScrapedPage if found and valid, None otherwise | ||
| """ | ||
| key = self._url_to_key(url) | ||
| return await self.adapter.get(collection="pages", key=key) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Harden reads against decryption/validation errors.
A bad key or corrupt entry will raise and bubble to callers. Return None and log instead.
Apply:
async def get_cached_page(self, url: str) -> ScrapedPage | None:
@@
- key = self._url_to_key(url)
- return await self.adapter.get(collection="pages", key=key)
+ key = self._url_to_key(url)
+ try:
+ return await self.adapter.get(collection="pages", key=key)
+ except Exception as e:
+ logging.error("Failed to read cached page %s: %s", url, e)
+ return NoneCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/web_scraper_cache/scraper.py around lines 131 to 143, the
get_cached_page call can raise on decryption/validation errors; wrap the await
self.adapter.get(...) in a try/except to catch Exception, log the error and
return None instead of allowing it to bubble. Use self.logger.warning(...) if
the scraper has a logger attribute (falling back to logging.warning if not),
include the URL/key and exception details in the log, and ensure the function
still returns Optional[ScrapedPage] (None on error).
| async def is_cached(self, url: str) -> bool: | ||
| """ | ||
| Check if a page is cached. | ||
| Args: | ||
| url: Page URL | ||
| Returns: | ||
| True if page is cached, False otherwise | ||
| """ | ||
| page = await self.get_cached_page(url) | ||
| return page is not None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider a cheap existence check.
is_cached() fetches full values. If the API exposes an exists/has method, prefer that to avoid decoding/decrypting payloads when only presence is needed.
Do you have a KeyValue.exists()/has() on PydanticAdapter or underlying store? If so, we can switch to it for O(1) presence checks.
🤖 Prompt for AI Agents
In examples/web_scraper_cache/scraper.py around lines 157 to 169, is_cached
currently calls get_cached_page which fetches and decodes the full payload;
change it to use a cheap existence check instead: call the adapter/store's
exists()/has() method (e.g., self.adapter.exists(key) or underlying
kv.exists/has) and return that boolean; if the adapter/store does not expose
exists/has, add a lightweight exists method on PydanticAdapter that queries the
underlying store for key existence (without decoding/decrypting value) and use
that from is_cached as a fallback.
| import pytest | ||
| from cryptography.fernet import Fernet | ||
|
|
||
| from scraper import ScrapedPage, WebScraperCache | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark module for asyncio to prevent accidental sync collection failures.
Without config, async tests may be skipped/fail. Either add module-level mark or ensure pytest.ini sets asyncio_mode=auto.
Apply:
import pytest
from cryptography.fernet import Fernet
from scraper import ScrapedPage, WebScraperCache
+pytestmark = pytest.mark.asyncio📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import pytest | |
| from cryptography.fernet import Fernet | |
| from scraper import ScrapedPage, WebScraperCache | |
| import pytest | |
| from cryptography.fernet import Fernet | |
| from scraper import ScrapedPage, WebScraperCache | |
| pytestmark = pytest.mark.asyncio |
🤖 Prompt for AI Agents
In examples/web_scraper_cache/test_scraper.py around lines 3 to 8, async tests
can be collected as sync and fail if pytest's asyncio plugin isn't configured;
import pytest at the top and add a module-level mark by setting pytestmark =
pytest.mark.asyncio so all async tests in this file are treated correctly
(alternatively ensure pytest.ini has asyncio_mode=auto).
|
@claude can you check the workflow and the coderabbit feedback and make progress on the PR |
|
Claude finished @strawgate's task —— View job Fixed CodeRabbit critical and major issues ✅Tasks Completed
Changes Made🔴 Critical Issues Fixed:
🟠 Major Issues Fixed:
VerificationLinting: Type Checking: Remaining CodeRabbit FeedbackThe remaining 🔵 minor/nitpick issues from CodeRabbit are mostly suggestions for improvements and don't block the PR:
These can be addressed in follow-up PRs if desired. Next StepsThe critical fixes have been committed and pushed. The CI workflow should now run successfully. Once the workflow completes, the PR will be ready for final review and merge. Commit: |
- Change absolute imports to relative imports in all __init__.py files - Fix security issue: log encryption key fingerprints instead of actual keys - Add pytest asyncio configuration (asyncio_mode = auto) to all examples - Constrain pydantic to <3.0.0 in all examples Co-authored-by: William Easton <strawgate@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
examples/web_scraper_cache/scraper.py (5)
28-29: Avoid configuring root logging at import time.Calling
basicConfigat module level affects host applications and test suites. Move the configuration insidemain()or behind a guard likeif __name__ == "__main__":.
75-85: Enforce size limit after encryption to cap actual stored bytes.The current order checks size before encryption. Fernet adds overhead, so entries near 5MB may exceed the cap on disk. Wrap
LimitSizeWrapperinsideFernetEncryptionWrapperso the size check applies to encrypted bytes.
95-107: Normalize URLs before hashing to reduce duplicate keys.Hashing raw URLs treats trivial differences (case, default ports, fragments, trailing slash) as different keys. Consider normalizing URLs (lowercase scheme/host, remove default ports, handle trailing slashes) before hashing to improve cache efficiency.
134-145: Harden reads against decryption/validation errors.A bad key or corrupt entry will raise and bubble to callers. Wrap
await self.adapter.get(...)in a try/except to catch exceptions, log the error, and returnNoneinstead of propagating the exception.
160-171: Consider a cheap existence check.
is_cached()fetches and decodes full values. If the underlying store or adapter exposes anexists()/has()method, prefer that to avoid decrypting/decoding payloads when only presence is needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (10)
examples/chat_app/__init__.py(1 hunks)examples/chat_app/pyproject.toml(1 hunks)examples/chat_app/requirements.txt(1 hunks)examples/trading_data/__init__.py(1 hunks)examples/trading_data/pyproject.toml(1 hunks)examples/trading_data/requirements.txt(1 hunks)examples/web_scraper_cache/__init__.py(1 hunks)examples/web_scraper_cache/pyproject.toml(1 hunks)examples/web_scraper_cache/requirements.txt(1 hunks)examples/web_scraper_cache/scraper.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
examples/trading_data/__init__.py (1)
examples/trading_data/trading_app.py (2)
PriceData(30-36)TradingDataCache(39-168)
examples/chat_app/__init__.py (2)
examples/chat_app/test_chat_app.py (1)
chat_app(12-14)examples/chat_app/chat_app.py (2)
ChatApp(34-134)ChatMessage(26-31)
examples/web_scraper_cache/__init__.py (1)
examples/web_scraper_cache/scraper.py (2)
ScrapedPage(33-39)WebScraperCache(42-175)
examples/web_scraper_cache/scraper.py (1)
examples/web_scraper_cache/test_scraper.py (1)
cache(12-17)
🔇 Additional comments (13)
examples/chat_app/requirements.txt (1)
1-2: LGTM! Dependencies are properly versioned.The dependency specifications are correct with appropriate version constraints. The Pydantic version is properly bounded to prevent accidental upgrades to v3, ensuring API stability.
examples/chat_app/pyproject.toml (4)
1-5: LGTM! Project metadata is properly configured.The project metadata is correctly structured with an appropriate Python version requirement (>=3.10) for modern async features.
6-9: LGTM! Previous feedback incorporated.The Pydantic dependency is now properly bounded to prevent accidental v3 upgrades, addressing the previous review feedback.
11-15: LGTM! Dev dependencies are appropriate.The test dependencies are correctly configured with appropriate version constraints for async testing.
17-22: LGTM! Previous feedback incorporated.The pytest asyncio configuration has been added as suggested in previous review feedback, ensuring async tests run reliably. The build system configuration is also properly set up.
examples/chat_app/__init__.py (1)
1-5: LGTM! Previous feedback incorporated.The package initialization is correctly implemented with relative imports, addressing the previous review feedback about import shadowing. The public API is properly exposed via
__all__.examples/trading_data/__init__.py (1)
1-5: LGTM! Relative import correctly implemented.The package initializer properly uses relative imports and controls the public API through
__all__. The previous review concern about absolute imports has been addressed.examples/trading_data/pyproject.toml (1)
17-18: Pytest asyncio configuration correctly added.The
asyncio_mode = "auto"configuration addresses the previous review feedback. The tests will now run without requiring explicit async markers.Note: The previous review also suggested
testpaths = ["."], but this is pytest's default behavior and can be omitted unless you need a different path.examples/web_scraper_cache/requirements.txt (1)
1-3: LGTM!The dependencies are appropriate for the web scraper cache example and align with the pyproject.toml configuration.
examples/web_scraper_cache/pyproject.toml (1)
18-19: Pytest async configuration properly added.The
asyncio_mode = "auto"setting ensures async tests run without requiring per-test decorators, addressing the previous review feedback.examples/web_scraper_cache/__init__.py (1)
3-3: Package-relative import correctly implemented.The relative import resolves the previous issue where absolute imports would fail when the package is used outside the examples directory.
examples/web_scraper_cache/scraper.py (2)
55-60: Encryption key fingerprinting properly implemented.The code now generates and logs a SHA-256 fingerprint instead of the raw encryption key, preventing secret leakage to logs.
200-206: Encryption key fingerprint properly used in demo.The demo now prints only a SHA-256 fingerprint instead of the raw encryption key, preventing secret leakage.
| py-key-value-aio>=0.2.8 | ||
| pydantic>=2.0.0,<3.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider removing this file in favor of pyproject.toml.
Having both requirements.txt and pyproject.toml creates duplication. Since pyproject.toml is the modern standard (PEP 621) and already declares these dependencies, you can install directly from it using pip install -e . or pip install -e ".[dev]" for dev dependencies.
To remove the duplication, delete this file and use:
pip install -e .🤖 Prompt for AI Agents
In examples/trading_data/requirements.txt around lines 1 to 2, this
requirements.txt duplicates dependencies already declared in pyproject.toml;
remove the file to avoid duplication and update README or CI docs to instruct
users to install from pyproject.toml using pip install -e . (use pip install -e
".[dev]" where dev extras are needed) Ensure any CI/test workflows or
documentation that referenced requirements.txt are updated to install from the
package metadata instead.
examples/trading_data/README.md
Outdated
|
|
||
| cache_store = PassthroughCacheWrapper( | ||
| cache=memory_cache, | ||
| key_value=CompressionWrapper(key_value=disk_cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the disk cache is already compressed so let's just skip the compression wrapper and maybe do a logging wrapper?
|
@claude please incorporate PR feedback |
|
Claude finished @strawgate's task —— View job PR feedback incorporated successfully ✅Tasks Completed
Changes ImplementedFrom strawgate's comment (README.md:229):
From CodeRabbit nitpick/minor feedback:
VerificationLinting: $ ruff check examples/
All checks passed!Type Checking: $ basedpyright examples/
0 errors, 0 warnings, 0 notesChanges CommittedAll changes have been committed and pushed to the branch: Commit: |
- Replace CompressionWrapper with LoggingWrapper in trading_data example (disk cache is already compressed, so logging is more useful) - Move logging.basicConfig() from module-level to main() in all examples (prevents side effects on import) - Fix fragile wrapper attribute chains by using explicit variable references (improves maintainability and removes type: ignore comments) - Update README documentation to match code changes Addresses feedback from @strawgate and CodeRabbit review. Co-authored-by: William Easton <strawgate@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (9)
examples/chat_app/chat_app.py (1)
134-134: Consider moving logging configuration to the__main__guard.While having
logging.basicConfiginsidemain()is better than import-time configuration, the previous review feedback suggested placing it directly in the__main__guard. This ensures logging setup happens only when running the script directly, not when callingmain()programmatically.async def main(): """Demonstrate the chat application.""" - # Configure logging for the demo - logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") - chat = ChatApp() # ... if __name__ == "__main__": + logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", + ) asyncio.run(main())examples/web_scraper_cache/scraper.py (4)
68-83: Enforce size limit after encryption to cap actual stored bytes.The current wrapper order checks size before encryption. Since Fernet adds overhead (~57 bytes + the data), entries near 5MB may exceed the intended storage cap. Wrap
LimitSizeWrapperinsideFernetEncryptionWrapperto enforce the limit on encrypted bytes.Apply this diff:
- primary_with_wrappers = TTLClampWrapper( - key_value=LimitSizeWrapper( - key_value=FernetEncryptionWrapper( - key_value=FallbackWrapper(key_value=disk_store, fallback=fallback_store), - key=encryption_key, - ), - max_size_bytes=5 * 1024 * 1024, # 5MB limit - ), - min_ttl=3600, # 1 hour minimum - max_ttl=7 * 24 * 3600, # 7 days maximum - ) + primary_with_wrappers = TTLClampWrapper( + key_value=FernetEncryptionWrapper( + key_value=LimitSizeWrapper( + key_value=FallbackWrapper(key_value=disk_store, fallback=fallback_store), + max_size_bytes=5 * 1024 * 1024, # 5MB limit (enforced on encrypted bytes) + ), + key=encryption_key, + ), + min_ttl=3600, # 1 hour minimum + max_ttl=7 * 24 * 3600, # 7 days maximum + )
93-105: Normalize URLs before hashing to reduce duplicate keys.Hashing the raw URL treats trivial differences (scheme/hostname case, default ports, fragments, trailing slashes) as distinct keys, causing redundant cache entries.
Apply this diff to add normalization:
+from urllib.parse import urlsplit, urlunsplit + @@ def _url_to_key(self, url: str) -> str: """ Convert URL to a safe cache key. - Uses SHA-256 hash to create a consistent, filesystem-safe key. + Normalizes the URL then uses SHA-256 hash to create a consistent, filesystem-safe key. Args: url: The URL to convert Returns: Hashed key string """ - return hashlib.sha256(url.encode()).hexdigest() + normalized = self._normalize_url(url) + return hashlib.sha256(normalized.encode()).hexdigest() + + def _normalize_url(self, url: str) -> str: + """Normalize URL to reduce duplicate cache entries.""" + parts = urlsplit(url) + scheme = parts.scheme.lower() + netloc = parts.netloc.lower() + # Remove default ports + if scheme == "http" and netloc.endswith(":80"): + netloc = netloc.rsplit(":", 1)[0] + elif scheme == "https" and netloc.endswith(":443"): + netloc = netloc.rsplit(":", 1)[0] + # Ensure non-empty path; drop fragment + path = parts.path or "/" + return urlunsplit((scheme, netloc, path, parts.query, ""))
132-143: Harden reads against decryption/validation errors.A wrong encryption key or corrupted entry will raise an exception that bubbles to callers. Catch these errors and return
Noneinstead.Apply this diff:
async def get_cached_page(self, url: str) -> ScrapedPage | None: """ Retrieve a cached page. Args: url: Page URL Returns: ScrapedPage if found and valid, None otherwise """ key = self._url_to_key(url) - return await self.adapter.get(collection="pages", key=key) + try: + return await self.adapter.get(collection="pages", key=key) + except Exception as e: + logger.error("Failed to read cached page %s: %s", url, e) + return None
158-169: Consider a cheap existence check.
is_cached()fetches the full value, requiring decryption and deserialization. If the adapter or underlying store exposes anexists()orhas()method, use that for O(1) presence checks without payload processing.If
PydanticAdapteror the underlying key-value store provides an existence check method, refactor to:async def is_cached(self, url: str) -> bool: """Check if a page is cached without fetching the full payload.""" key = self._url_to_key(url) # If adapter/store has exists() or has(): return await self.adapter.exists(collection="pages", key=key) # Otherwise fallback to current implementationexamples/trading_data/README.md (2)
45-51: Add working directory context and clarify uv usage.Run commands from the example folder and note uv is an editable install.
Apply:
```bash -# Install dependencies -pip install -r requirements.txt +# From the example directory +cd examples/trading_data + +# Install dependencies +pip install -r requirements.txt @@ -# Or using uv -uv pip install -e . +# Or using uv (editable install of the example) +uv pip install -e .--- `139-145`: **Undefined variable in Retry example; use the defined disk cache.** Apply: ```diff retry_store = RetryWrapper( - key_value=base_store, + key_value=disk_cache, max_retries=3, base_delay=0.1 )examples/trading_data/trading_app.py (2)
141-141: Return type includes a float (hit_rate_percent).Apply:
-def get_cache_statistics(self) -> dict[str, int]: +def get_cache_statistics(self) -> dict[str, int | float]:
115-116: Docstring should reflect “latest” alias now being maintained.Apply:
- Note: This is a simplified implementation. In production, you'd want - to maintain a separate "latest" key or use a time-series database. + Note: For demo convenience, store_price also maintains a "latest" alias + per symbol. For high write rates, consider a time-series database.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
examples/chat_app/chat_app.py(1 hunks)examples/trading_data/README.md(1 hunks)examples/trading_data/trading_app.py(1 hunks)examples/web_scraper_cache/scraper.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
examples/trading_data/trading_app.py (1)
examples/trading_data/test_trading_app.py (1)
cache(11-15)
examples/web_scraper_cache/scraper.py (2)
examples/trading_data/trading_app.py (1)
cleanup(163-165)examples/web_scraper_cache/test_scraper.py (1)
cache(12-17)
🔇 Additional comments (9)
examples/chat_app/chat_app.py (1)
39-59: Excellent refactoring of the wrapper stack!The wrapper stack now uses explicit intermediate variables (stats, ttl_clamped, wrapped_store) and stores a direct reference to the StatisticsWrapper. This addresses the previous review feedback perfectly—eliminating the fragile attribute chain and the need for type: ignore.
examples/web_scraper_cache/scraper.py (7)
1-28: Module setup follows best practices.The module structure is clean with proper imports and a module-level logger. Logging configuration has been correctly moved to the
main()function rather than import time.
31-37: LGTM!The
ScrapedPagemodel is well-structured with appropriate field types for web scraping use cases.
107-130: LGTM!The
cache_pagemethod properly handles errors and uses timezone-aware timestamps. The try/except pattern ensures graceful failure without bubbling exceptions.
145-156: LGTM!The
invalidate_pagemethod correctly deletes cached entries and returns the operation result.
171-173: LGTM!The
cleanup()method is appropriately stubbed for an example project, with clear guidance for production usage.
198-207: Excellent security practices in the demo.Logging is configured inside
main()(not at import time), and only the key fingerprint is displayed—never the raw encryption key. This properly demonstrates secure key handling.
246-249: Size limit test may behave unexpectedly.The 6MB test is intended to demonstrate the size limit, but due to the wrapper order (size check before encryption), the behavior depends on whether the test expects rejection based on plaintext or encrypted size. If the intent is to test the final stored size, note that the current implementation checks plaintext size.
Verify the expected behavior: should the 5MB limit apply to plaintext or encrypted bytes?
examples/trading_data/trading_app.py (1)
168-171: Good: logging configured at runtime, not import time.Please confirm tests don’t reconfigure logging globally elsewhere.
| message = ChatMessage(sender=sender, content=content, timestamp=datetime.now(tz=timezone.utc)) | ||
|
|
||
| # Use timestamp as message ID for chronological ordering | ||
| message_id = message.timestamp.isoformat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential message ID collision with timestamp-based keys.
Using timestamp.isoformat() as the message ID could lead to collisions if multiple messages are sent within the same microsecond (e.g., in automated tests or high-throughput scenarios). Consider appending a UUID suffix or counter to guarantee uniqueness.
Example fix:
+import uuid
+
# ...
- # Use timestamp as message ID for chronological ordering
- message_id = message.timestamp.isoformat()
+ # Use timestamp + UUID for chronological ordering with uniqueness guarantee
+ message_id = f"{message.timestamp.isoformat()}-{uuid.uuid4().hex[:8]}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| message = ChatMessage(sender=sender, content=content, timestamp=datetime.now(tz=timezone.utc)) | |
| # Use timestamp as message ID for chronological ordering | |
| message_id = message.timestamp.isoformat() | |
| message = ChatMessage(sender=sender, content=content, timestamp=datetime.now(tz=timezone.utc)) | |
| # Use timestamp + UUID for chronological ordering with uniqueness guarantee | |
| message_id = f"{message.timestamp.isoformat()}-{uuid.uuid4().hex[:8]}" |
🤖 Prompt for AI Agents
In examples/chat_app/chat_app.py around lines 72 to 75, the message_id is
currently derived solely from message.timestamp.isoformat(), which can collide
if multiple messages share the same microsecond; change the ID to a
guaranteed-unique value by appending a UUID (or an incrementing counter) to the
timestamp: generate a uuid4() (ensure you import uuid) and combine it with the
isoformat timestamp (e.g., timestamp_iso + "_" + uuid4().hex) when assigning
message_id so each message ID is globally unique.
| if isinstance(self.stats_wrapper, StatisticsWrapper): | ||
| return { | ||
| "total_puts": self.stats_wrapper.total_puts, | ||
| "total_gets": self.stats_wrapper.total_gets, | ||
| "total_deletes": self.stats_wrapper.total_deletes, | ||
| "get_hits": self.stats_wrapper.get_hits, | ||
| "get_misses": self.stats_wrapper.get_misses, | ||
| } | ||
| return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Unnecessary isinstance check for stats_wrapper.
The isinstance check is redundant since self.stats_wrapper is always assigned a StatisticsWrapper instance in __init__ (line 58). You can safely remove the check or add a type assertion for clarity.
def get_statistics(self) -> dict[str, int]:
- """
- Get operation statistics.
-
- Returns:
- Dictionary with operation counts (puts, gets, deletes, etc.)
- """
- if isinstance(self.stats_wrapper, StatisticsWrapper):
- return {
- "total_puts": self.stats_wrapper.total_puts,
- "total_gets": self.stats_wrapper.total_gets,
- "total_deletes": self.stats_wrapper.total_deletes,
- "get_hits": self.stats_wrapper.get_hits,
- "get_misses": self.stats_wrapper.get_misses,
- }
- return {}
+ """
+ Get operation statistics.
+
+ Returns:
+ Dictionary with operation counts (puts, gets, deletes, etc.)
+ """
+ return {
+ "total_puts": self.stats_wrapper.total_puts,
+ "total_gets": self.stats_wrapper.total_gets,
+ "total_deletes": self.stats_wrapper.total_deletes,
+ "get_hits": self.stats_wrapper.get_hits,
+ "get_misses": self.stats_wrapper.get_misses,
+ }🤖 Prompt for AI Agents
In examples/chat_app/chat_app.py around lines 120 to 128, the
isinstance(self.stats_wrapper, StatisticsWrapper) guard is redundant because
__init__ always assigns a StatisticsWrapper; remove the conditional and directly
return the dict constructed from self.stats_wrapper's attributes (total_puts,
total_gets, total_deletes, get_hits, get_misses). If you want an explicit safety
check instead, replace the runtime check with a short assert or a type comment
(e.g., assert isinstance(self.stats_wrapper, StatisticsWrapper) or # type:
StatisticsWrapper) and then return the dict unconditionally.
| ```bash | ||
| python trading_app.py | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Prefix demo commands with cd into the example.
Apply:
-```bash
-python trading_app.py
-```
+```bash
+cd examples/trading_data
+python trading_app.py
+```🤖 Prompt for AI Agents
In examples/trading_data/README.md around lines 85 to 87, the demo command block
doesn't change directory into the example before running the script; update the
fenced bash block to first run "cd examples/trading_data" and then "python
trading_app.py" so users are in the correct working directory when running the
demo.
| ```bash | ||
| # Install dev dependencies | ||
| pip install -e ".[dev]" | ||
|
|
||
| # Run tests | ||
| pytest test_trading_app.py -v | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Also cd into the example when running tests.
Apply:
-```bash
-# Install dev dependencies
-pip install -e ".[dev]"
+```bash
+cd examples/trading_data
+
+# Install dev dependencies
+pip install -e ".[dev]"
@@
# Run tests
pytest test_trading_app.py -v
<details>
<summary>🤖 Prompt for AI Agents</summary>
In examples/trading_data/README.md around lines 91 to 97, the test instructions
omit changing into the example directory; update the code block so the commands
first run cd examples/trading_data, then install dev dependencies (pip install
-e ".[dev]") and then run pytest test_trading_app.py -v, ensuring the cd command
appears before the install and test commands in the fenced bash block.
</details>
<!-- fingerprinting:phantom:medusa:chinchilla -->
<!-- This is an auto-generated comment by CodeRabbit -->
| ### Compression | ||
|
|
||
| CompressionWrapper reduces storage requirements: | ||
|
|
||
| ```python | ||
| compressed_store = CompressionWrapper(key_value=disk_cache) | ||
| ``` | ||
|
|
||
| Especially effective for: | ||
|
|
||
| - Historical price data with many data points | ||
| - JSON-serialized objects with repeated keys | ||
| - Text-heavy data structures | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Compression section to match current design (per maintainer).
Disk cache is already compressed; this section confuses readers. Consider replacing with a brief LoggingWrapper example if needed.
Apply:
-### Compression
-
-CompressionWrapper reduces storage requirements:
-
-```python
-compressed_store = CompressionWrapper(key_value=disk_cache)
-```
-
-Especially effective for:
-
-- Historical price data with many data points
-- JSON-serialized objects with repeated keys
-- Text-heavy data structures
-I can add a concise LoggingWrapper snippet instead—say the one mirroring the stack used in the code.
🤖 Prompt for AI Agents
In examples/trading_data/README.md around lines 121 to 134, remove the entire
"Compression" section (the CompressionWrapper example and the following bullet
list) because disk cache is already compressed and this section is misleading;
optionally replace it with a short LoggingWrapper example that mirrors the
project's stack (showing wrapping the disk_cache with LoggingWrapper and any
relevant instantiation notes) so readers see a useful wrapper example instead of
CompressionWrapper.
| # Use timestamp as key for chronological ordering | ||
| data_id = price_data.timestamp.isoformat() | ||
|
|
||
| await self.adapter.put(collection=f"symbol:{symbol}", key=data_id, value=price_data, ttl=ttl) | ||
|
|
||
| return data_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filesystem‑safe keys/collections and write the “latest” alias.
Colons in collections and ISO keys break on Windows; also get_latest_price relies on a key never set.
Apply:
@@
- # Use timestamp as key for chronological ordering
- data_id = price_data.timestamp.isoformat()
+ # Filesystem-safe, lexicographically sortable key
+ data_id = price_data.timestamp.strftime("%Y%m%dT%H%M%S%fZ")
@@
- await self.adapter.put(collection=f"symbol:{symbol}", key=data_id, value=price_data, ttl=ttl)
+ col = self._collection_for(symbol)
+ await self.adapter.put(collection=col, key=data_id, value=price_data, ttl=ttl)
+ # Maintain a fast "latest" pointer for convenience reads
+ await self.adapter.put(collection=col, key="latest", value=price_data, ttl=ttl)
@@
- return await self.adapter.get(collection=f"symbol:{symbol}", key=data_id)
+ return await self.adapter.get(collection=self._collection_for(symbol), key=data_id)
@@
- return await self.adapter.get(collection=f"symbol:{symbol}", key="latest")
+ return await self.adapter.get(collection=self._collection_for(symbol), key="latest")
@@
- return await self.adapter.delete(collection=f"symbol:{symbol}", key=data_id)
+ return await self.adapter.delete(collection=self._collection_for(symbol), key=data_id)Additions outside the shown ranges:
@@
-import asyncio
-import logging
+import asyncio
+import logging
+import re
@@
class TradingDataCache:
@@
def __init__(self, cache_dir: str = ".trading_cache"):
...
+ @staticmethod
+ def _collection_for(symbol: str) -> str:
+ """Filesystem-safe collection name."""
+ safe = re.sub(r'[^A-Za-z0-9._-]', '_', symbol)
+ return f"symbol-{safe}"Also applies to: 109-110, 126-127, 139-139
|
@claude can you update these examples now that we've done a significant refactor of the project? |
|
Claude finished @strawgate's task —— View job Successfully updated examples after refactor ✅I've successfully updated all three example projects to work with the refactored py-key-value API. All tests are passing and the code is linted and type-checked. Key Changes MadeDependency & Build Changes:
API Changes Updated (8 breaking changes):
Architecture Improvements:
Test Results
Files Changed
Commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
pyproject.toml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-26T16:57:34.087Z
Learnt from: CR
PR: strawgate/py-key-value#0
File: AGENTS.md:0-0
Timestamp: 2025-10-26T16:57:34.087Z
Learning: Applies to **/*.py : Use Ruff for Python formatting and linting (configured in pyproject.toml)
Applied to files:
pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
| "**/test_*.py" = [ | ||
| "S101", # Ignore asserts | ||
| "DTZ005", # Ignore datetime.UTC | ||
| "PLR2004", # Ignore magic values | ||
| "E501", # Ignore line length | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Inconsistent ignore sets between test file patterns.
The new **/test_*.py pattern is missing ARG001 and ARG002 ignores that are present in the existing **/tests/*.py pattern. This could cause unused argument warnings in example project tests if fixtures or helper functions have unused parameters.
For consistency and to prevent false positives in the new test suite, align the ignore sets:
"**/test_*.py" = [
"S101", # Ignore asserts
"DTZ005", # Ignore datetime.UTC
"PLR2004", # Ignore magic values
"E501", # Ignore line length
+ "ARG001", # Unused argument, Pyright captures this for us
+ "ARG002", # Unused argument, Pyright captures this for us
]🤖 Prompt for AI Agents
pyproject.toml around lines 77 to 82: the ignore list for the new pattern
"**/test_*.py" is missing ARG001 and ARG002 which are present in the existing
"**/tests/*.py" pattern, causing potential unused-argument false positives;
update the "**/test_*.py" ignore list to include ARG001 and ARG002 so both test
patterns have the same ignore set.
- Add store extras ([memory], [disk]) to all dependency files - Fix hatch build configuration in pyproject.toml files - Update TTLClampWrapper to require both min_ttl and max_ttl - Update StatisticsWrapper API to use new nested structure - Update DiskStore parameter from root_directory to directory - Update RetryWrapper parameter from base_delay to initial_delay - Update PassthroughCacheWrapper parameters to primary_key_value/cache_key_value - Update FernetEncryptionWrapper to use fernet instance instead of key - Update FallbackWrapper parameters to primary_key_value/fallback_key_value - Update LimitSizeWrapper parameter from max_size_bytes to max_size - Move StatisticsWrapper to top of stack in trading_data for better tracking - Add raise_on_decryption_error=False for graceful encryption failures All tests passing: chat_app (7/7), trading_data (9/9), web_scraper_cache (13/13) Co-authored-by: William Easton <strawgate@users.noreply.github.com>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (10)
examples/chat_app/chat_app.py (2)
74-75: Potential message ID collision with timestamp-based keys.Using
timestamp.isoformat()as the message ID could lead to collisions if multiple messages are sent within the same microsecond (e.g., in automated tests or high-throughput scenarios). While this may be acceptable for a simple example, consider documenting this limitation or appending a UUID suffix for uniqueness.Example fix:
+import uuid + # ... - # Use timestamp as message ID for chronological ordering - message_id = message.timestamp.isoformat() + # Use timestamp + UUID for chronological ordering with uniqueness guarantee + message_id = f"{message.timestamp.isoformat()}-{uuid.uuid4().hex[:8]}"
120-135: Redundant isinstance check; aggregation logic is sound.The
isinstancecheck is redundant sinceself.stats_wrapperis always assigned aStatisticsWrapperinstance in__init__(line 58). However, the statistics aggregation across collections is a good implementation for multi-conversation tracking.def get_statistics(self) -> dict[str, int]: """ Get operation statistics across all conversations. Returns: Dictionary with aggregated operation counts (puts, gets, deletes, etc.) """ - if isinstance(self.stats_wrapper, StatisticsWrapper): - # Aggregate statistics across all collections (conversations) - total_puts = sum(coll_stats.put.count for coll_stats in self.stats_wrapper.statistics.collections.values()) - total_gets = sum(coll_stats.get.count for coll_stats in self.stats_wrapper.statistics.collections.values()) - total_deletes = sum(coll_stats.delete.count for coll_stats in self.stats_wrapper.statistics.collections.values()) - get_hits = sum(coll_stats.get.hit for coll_stats in self.stats_wrapper.statistics.collections.values()) - get_misses = sum(coll_stats.get.miss for coll_stats in self.stats_wrapper.statistics.collections.values()) - - return { - "total_puts": total_puts, - "total_gets": total_gets, - "total_deletes": total_deletes, - "get_hits": get_hits, - "get_misses": get_misses, - } - return {} + # Aggregate statistics across all collections (conversations) + total_puts = sum(coll_stats.put.count for coll_stats in self.stats_wrapper.statistics.collections.values()) + total_gets = sum(coll_stats.get.count for coll_stats in self.stats_wrapper.statistics.collections.values()) + total_deletes = sum(coll_stats.delete.count for coll_stats in self.stats_wrapper.statistics.collections.values()) + get_hits = sum(coll_stats.get.hit for coll_stats in self.stats_wrapper.statistics.collections.values()) + get_misses = sum(coll_stats.get.miss for coll_stats in self.stats_wrapper.statistics.collections.values()) + + return { + "total_puts": total_puts, + "total_gets": total_gets, + "total_deletes": total_deletes, + "get_hits": get_hits, + "get_misses": get_misses, + }examples/web_scraper_cache/scraper.py (4)
76-87: Enforce size limit after encryption to cap actual stored bytes.The current wrapper order checks size before encryption. Since Fernet adds overhead (~57 bytes + padding), entries near 5MB plaintext will exceed the limit after encryption.
Apply this diff to move the size check inside the encryption wrapper:
primary_with_wrappers = TTLClampWrapper( - key_value=LimitSizeWrapper( - key_value=FernetEncryptionWrapper( - key_value=FallbackWrapper(primary_key_value=disk_store, fallback_key_value=fallback_store), + key_value=FernetEncryptionWrapper( + key_value=LimitSizeWrapper( + key_value=FallbackWrapper(primary_key_value=disk_store, fallback_key_value=fallback_store), + max_size=5 * 1024 * 1024, # 5MB limit (applies to encrypted bytes) + ), - fernet=fernet, - raise_on_decryption_error=False, # Return None on decryption failure instead of raising - ), - max_size=5 * 1024 * 1024, # 5MB limit + fernet=fernet, + raise_on_decryption_error=False, # Return None on decryption failure instead of raising ), min_ttl=3600, # 1 hour minimum max_ttl=7 * 24 * 3600, # 7 days maximum )
97-109: Consider normalizing URLs before hashing to reduce duplicate keys.Hashing raw URLs treats trivial differences (scheme case, default ports, fragments, trailing slashes) as different keys, potentially causing unnecessary cache misses.
Apply this diff to add URL normalization:
+from urllib.parse import urlsplit, urlunsplit + ... def _url_to_key(self, url: str) -> str: """ Convert URL to a safe cache key. - Uses SHA-256 hash to create a consistent, filesystem-safe key. + URLs are normalized before hashing to treat trivial variations + (case, default ports, fragments) as identical. Args: url: The URL to convert Returns: Hashed key string """ - return hashlib.sha256(url.encode()).hexdigest() + normalized = self._normalize_url(url) + return hashlib.sha256(normalized.encode()).hexdigest() + + def _normalize_url(self, url: str) -> str: + """Normalize URL to reduce duplicate cache entries.""" + parts = urlsplit(url) + scheme = parts.scheme.lower() + netloc = parts.netloc.lower() + + # Remove default ports + if scheme == "http" and netloc.endswith(":80"): + netloc = netloc.rsplit(":", 1)[0] + elif scheme == "https" and netloc.endswith(":443"): + netloc = netloc.rsplit(":", 1)[0] + + # Normalize path and drop fragment + path = parts.path or "/" + return urlunsplit((scheme, netloc, path, parts.query, ""))
162-173: Consider using a lightweight existence check instead of fetching full values.
is_cached()currently callsget_cached_page(), which fetches, decrypts, and deserializes the entire payload just to check presence. For large pages, this is inefficient.If the underlying
KeyValueinterface exposes anexists()orhas()method, use it here for O(1) presence checks. Otherwise, you might want to add such a method toPydanticAdapter:async def is_cached(self, url: str) -> bool: """ Check if a page is cached. Args: url: Page URL Returns: True if page is cached, False otherwise """ key = self._url_to_key(url) # If adapter/store supports exists(): return await self.adapter.exists(collection="pages", key=key) # Otherwise, keep current implementation as fallbackCheck if the
KeyValueinterface orPydanticAdapterprovides an existence check method:#!/bin/bash # Search for exists/has methods in the key-value library # Look for exists or has method definitions ast-grep --pattern $'class $CLASS { $$$ async def exists($$$) { $$$ } $$$ }' ast-grep --pattern $'class $CLASS { $$$ async def has($$$) { $$$ } $$$ }' # Also check with ripgrep rg -nP --type=py 'async def (exists|has)\s*\(' -A 3
136-147: Consider defensive error handling for deserialization failures.While decryption errors return
None(line 81), Pydantic validation errors could still raise. Adding a try/except would make the method more resilient.Apply this diff:
async def get_cached_page(self, url: str) -> ScrapedPage | None: """ Retrieve a cached page. Args: url: Page URL Returns: ScrapedPage if found and valid, None otherwise """ key = self._url_to_key(url) - return await self.adapter.get(collection="pages", key=key) + try: + return await self.adapter.get(collection="pages", key=key) + except Exception: + logger.exception(f"Failed to retrieve cached page {url}") + return Noneexamples/trading_data/requirements.txt (1)
1-2: Remove duplication; use pyproject.toml as single source of truth.These dependencies duplicate lines 7-8 in
pyproject.toml. Having both files creates a maintenance burden and risks version drift. Sincepyproject.tomlis the modern standard (PEP 621), remove this file and usepip install -e .orpip install -e ".[dev]"instead.examples/trading_data/trading_app.py (2)
78-98: Fix filesystem portability and write "latest" alias.Two issues need addressing:
Windows filesystem incompatibility: Line 94 uses
isoformat()which includes colons (e.g.,2024-10-26T12:34:56+00:00), and line 96 usesf"symbol:{symbol}"with a colon. Both are invalid for Windows file paths when usingDiskStore.Missing "latest" alias:
get_latest_price()expects a "latest" key (line 128) that is never written, making that method non-functional.Apply this diff:
+ @staticmethod + def _collection_for(symbol: str) -> str: + """Return filesystem-safe collection name.""" + return f"symbol-{symbol.replace('/', '_').replace(':', '_')}" + async def store_price(self, symbol: str, price: float, volume: int, ttl: int | None = None) -> str: """ Store price data for a symbol. Args: symbol: Trading symbol (e.g., "AAPL", "BTC-USD") price: Current price volume: Trading volume ttl: Time-to-live in seconds (optional) Returns: Price data ID (timestamp-based) """ price_data = PriceData(symbol=symbol, price=price, volume=volume, timestamp=datetime.now(tz=timezone.utc)) - # Use timestamp as key for chronological ordering - data_id = price_data.timestamp.isoformat() + # Filesystem-safe, lexicographically sortable key + data_id = price_data.timestamp.strftime("%Y%m%dT%H%M%S%fZ") - await self.adapter.put(collection=f"symbol:{symbol}", key=data_id, value=price_data, ttl=ttl) + col = self._collection_for(symbol) + await self.adapter.put(collection=col, key=data_id, value=price_data, ttl=ttl) + # Maintain "latest" pointer for get_latest_price() + await self.adapter.put(collection=col, key="latest", value=price_data, ttl=ttl) return data_id
100-141: Apply filesystem-safe collection naming to all methods.These methods use
f"symbol:{symbol}"which is incompatible with Windows file paths. Use the_collection_for()helper method suggested in the previous comment.Apply this diff:
async def get_price(self, symbol: str, data_id: str) -> PriceData | None: """ Retrieve price data for a symbol. Args: symbol: Trading symbol data_id: Price data identifier (timestamp) Returns: PriceData if found, None otherwise """ - return await self.adapter.get(collection=f"symbol:{symbol}", key=data_id) + return await self.adapter.get(collection=self._collection_for(symbol), key=data_id) async def get_latest_price(self, symbol: str) -> PriceData | None: """ Get the most recent price data for a symbol. - Note: This is a simplified implementation. In production, you'd want - to maintain a separate "latest" key or use a time-series database. + Note: store_price maintains a "latest" alias per symbol for fast lookups. Args: symbol: Trading symbol Returns: Latest PriceData if available, None otherwise """ - # This is a demonstration - in production you'd track the latest key - # For now, this just returns None to show the API pattern - return await self.adapter.get(collection=f"symbol:{symbol}", key="latest") + return await self.adapter.get(collection=self._collection_for(symbol), key="latest") async def delete_price(self, symbol: str, data_id: str) -> bool: """ Delete price data. Args: symbol: Trading symbol data_id: Price data identifier Returns: True if deleted, False if not found """ - return await self.adapter.delete(collection=f"symbol:{symbol}", key=data_id) + return await self.adapter.delete(collection=self._collection_for(symbol), key=data_id)examples/trading_data/pyproject.toml (1)
17-18: Pytest asyncio config added; consider explicit testpaths for consistency.The
asyncio_mode = "auto"has been properly added. For completeness, you could addtestpaths = ["."]to make test discovery explicit, but this is optional since pytest will auto-discover tests in the current directory.Optional addition:
[tool.pytest.ini_options] asyncio_mode = "auto" +testpaths = ["."]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
examples/chat_app/chat_app.py(1 hunks)examples/chat_app/pyproject.toml(1 hunks)examples/chat_app/requirements.txt(1 hunks)examples/trading_data/pyproject.toml(1 hunks)examples/trading_data/requirements.txt(1 hunks)examples/trading_data/trading_app.py(1 hunks)examples/web_scraper_cache/pyproject.toml(1 hunks)examples/web_scraper_cache/requirements.txt(1 hunks)examples/web_scraper_cache/scraper.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use Ruff for Python formatting and linting (configured in pyproject.toml)
Python line length is 140 characters
Use Basedpyright in strict mode for static type checking
Files:
examples/trading_data/trading_app.pyexamples/chat_app/chat_app.pyexamples/web_scraper_cache/scraper.py
🧬 Code graph analysis (3)
examples/trading_data/trading_app.py (1)
examples/trading_data/test_trading_app.py (1)
cache(11-15)
examples/chat_app/chat_app.py (2)
examples/trading_data/trading_app.py (1)
main(173-229)examples/web_scraper_cache/scraper.py (1)
main(202-271)
examples/web_scraper_cache/scraper.py (1)
examples/web_scraper_cache/test_scraper.py (1)
cache(12-17)
🔇 Additional comments (23)
examples/chat_app/requirements.txt (1)
1-2: LGTM!Dependencies are properly versioned and consistent with the project configuration. The Pydantic constraint prevents unexpected major version upgrades.
examples/chat_app/pyproject.toml (4)
1-9: LGTM!Project metadata and dependencies are well-configured. The Pydantic version constraint correctly prevents major version upgrades, and dependencies match requirements.txt.
11-15: LGTM!Dev dependencies are appropriate for testing async code. The versions are recent and properly constrained.
17-18: LGTM!The pytest asyncio auto mode configuration ensures async tests are properly collected and executed without explicit marks. Good for test suite usability.
20-25: LGTM!Build system configuration is standard and appropriate for this example project.
examples/chat_app/chat_app.py (6)
1-21: LGTM!Module documentation and imports are clear and well-organized. The docstring effectively describes the example's purpose and key features.
23-29: LGTM!ChatMessage model is clean and appropriately typed for its purpose.
39-59: LGTM!The initialization properly constructs the wrapper stack and maintains a direct reference to the statistics wrapper. The TTL constraints (1-24 hours) are reasonable for a chat application.
87-112: LGTM!The get_message and delete_message methods are clean wrappers around the adapter operations with appropriate type hints and documentation.
138-183: LGTM!The demo function is well-structured and comprehensive. Logging configuration is correctly placed inside the function rather than at import time, avoiding global side effects.
185-186: LGTM!Standard and clean entry point for the async demo.
examples/web_scraper_cache/pyproject.toml (1)
1-26: LGTM! Project configuration is well-structured.The pytest asyncio configuration has been properly added (lines 18-19), and all dependencies are correctly specified.
examples/web_scraper_cache/scraper.py (7)
1-28: LGTM! Module structure and imports are well-organized.The logging configuration has been correctly moved to
main()(line 205), addressing the previous concern about import-time side effects.
31-37: LGTM! Clean data model.The
ScrapedPagemodel is properly structured with appropriate types.
111-134: LGTM! Proper error handling.The method correctly handles exceptions and provides clear success/failure feedback.
149-160: LGTM! Clean invalidation logic.
175-177: LGTM! Cleanup placeholder is appropriate for an example.
180-199: LGTM! Good simulation for demonstration purposes.
202-275: LGTM! Comprehensive demonstration with proper security practices.The main function correctly:
- Configures logging only when run directly
- Shows only key fingerprints, never raw keys (lines 210-211)
- Demonstrates all major cache operations
- Ensures cleanup in finally block
examples/trading_data/trading_app.py (4)
1-34: LGTM!The module docstring clearly describes the patterns demonstrated, imports are correct, and the
PriceDatamodel is well-structured for type-safe storage.
45-76: Clean wrapper stack with direct statistics reference.The wrapper composition is clear and maintainable with an explicit reference to the
StatisticsWrapper. The multi-tier caching architecture is well-documented.
143-166: LGTM!The return type correctly includes
floatforhit_rate_percent, and the aggregation logic properly handles division by zero. The statistics collection across all symbols is implemented correctly.
168-230: LGTM!The logging configuration is correctly placed inside
main()rather than at module import time, avoiding interference with library code and tests. The demonstration flow comprehensively exercises the cache operations.
| @@ -0,0 +1,3 @@ | |||
| py-key-value-aio[disk,memory]>=0.2.8 | |||
| pydantic>=2.0.0,<3.0.0 | |||
| cryptography>=41.0.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify cryptography version for security advisories.
Ensure that cryptography>=41.0.0 is current and free from known vulnerabilities.
Run the following script to check for the latest version and any security advisories:
🏁 Script executed:
#!/bin/bash
# Description: Check for security advisories and latest version of cryptography package
# Check PyPI for latest version
echo "Latest cryptography version:"
curl -s https://pypi.org/pypi/cryptography/json | jq -r '.info.version'
# Check for security advisories
echo -e "\nSecurity advisories:"
gh api graphql -f query='
{
securityVulnerabilities(first: 10, ecosystem: PIP, package: "cryptography") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'Length of output: 2494
Update cryptography requirement to 41.0.2 or later due to HIGH severity vulnerability.
The current requirement cryptography>=41.0.0 allows vulnerable versions. Version 41.0.0 is affected by a HIGH severity SSH certificate mishandling vulnerability (fixed in 41.0.2). Update to at least cryptography>=41.0.2, or consider cryptography>=46.0.3 for the latest stable release.




Added three example projects demonstrating py-key-value usage:
Each example includes:
Fixes #105
Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
Chores